ASYMPTOTIC STABILITY OF THE WONHAM FILTER: ERGODIC 
AND NON-ERGODIC SIGNALS 
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Abstract. Stability problem of the Wonham filter with respect to initial conditions is addressed. 
The case of ergodic signals is revisited in view of a gap in the classic work of H. Kunita (1971). We 
give new bounds for the exponential stability rates, which do not depend on the observations. In the 
non-ergodic case, the stability is implied by identifiability conditions, formulated explicitly in terms 
of the transition intensities matrix and the observation structure. 
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1. Introduction. The optimal filtering estimate of a signal from the record of 
noisy observations is usually generated by a nonlinear recursive equation subject to 
the signal a priori distribution. If the latter is unknown and the filtering equation is 
initialized by an arbitrary initial distribution, the obtained estimate is suboptimal in 
general. From an applications point of view, it is important to know whether such 
estimate becomes close to the optimal one at least after enough time elapses. This 
property of filters to forget the initial conditions is far from being obvious and in fact 
generally remains an open and challenging problem. 

In this paper, we consider the filtering setting for signals with a finite state space. 
Specifically, let X = (X t )t>o be a continuous time homogeneous Markov chain ob- 
served via 



with the Wiener process W = (Wt)t>o, independent of X, some bounded function h, 
and a ^ 0. 

We assume that X t takes values in the finite alphabet § = {ai, a„} and admits 
several ergodic classes. Namely, 



where the sub-alphabets Si , . . . , § m are noncommunicating in the sense that for any 
i =/= j and t > s 



So, unless m = 1, Xt is a compound Markov chain with the transition intensities 
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of m ergodic classes and is not ergodic itself. 

The filtering problem consists in computation of the conditional distribution, 

Ttf(l) = P{X» = a x \% A ), . . . ,7r t » = P(X? = a n \% A ), 

where ^ot] * s ^ ne filtration, generated by {Y",0 < s < t} satisfying the usual condi- 
tions (henceforth, the superscript v is used to emphasize that the distribution of X$ 
is v). 

The vector- valued random process it" with entries 7^(1), ... , 7r^(n) is generated 
by the Wonham filter g5] (see also [S3 Chap. 9]) 

(1.4) 

d< = A*<di + ^(diagK) - i$(<n%)*)h(dY t v - h*^dt), 

where diag(x) is the scalar matrix with the diagonal x € R™, h is the column vector 
with entries h(ai), . . . , h(a n ), and * is the transposition symbol. If v is unknown and 
some other distribution (3 (on S) is used to initialize the filter, the "wrong" conditional 



distribution tt^ is obtained: 



3v 



dir? v = A*TT^dt + a- 2 (diag(7rf I/ ) - Trf "(rf")*)h(dY? - h*ir^dt). 

According to the intuitive notion of stability, given at the beginning of this section, 
the filter defined in (|1.5f) is said to be asymptotically stable if 

(1.6) lim - Trfl = 0, 

t — >oo 

where || ■ || is the total variation norm. 

If the state space of the Markov chain X consists of one ergodic class (to = 1), 
our setting is in the framework studied by Ocone and Pardoux [35] . In this case, there 
exists the unique invariant distribution [i, so that 

(1.7) lim \\S a ~[i\\ = 0, 

t — >oo 

where St is the semigroup corresponding to X and 7 is an arbitrary probability 
distribution on 8. Moreover 

(1.8) lim f \Stf(x)-ti(f)\d^x)=0 

holds for any bounded / : S 1— » R. So, it may seem that it remains only to assume 

(1.9) v<l3 

and allude to |35| . However, the proof of given in |35j uses as its central argument 
the uniqueness theorem for the stationary measure of the filtering process tt^ which 
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appeared in the work of H. Kunita [23- Unfortunately, the proof of this theorem 
(Theorem 3.3 in [231 ) contains a serious gap, as elaborated in the next section. 

Different approach to the stability analysis of the filters for ergodic signals was 
initiated by Delyon and Zcitouni |20| . The authors studied the top Lyapunov exponent 
of the filtering equation 

7 CT (/3',/3") = lim - log llftf " — 7rf "II, j3' and j3" distributions on §, 

t— >oo t 

and show that 7 CT (/3 / ,/3") < when A and h satisfy certain conditions. Moreover 
the filter is found to be stable in the low signal-to-noise regime: lim j a ((3',0") < 

a — >oc 

9fi[A max (A)] with A max (A) being the eigenvalue of A with the largest nonzero real part. 

These results were further extended by Atar and Zeitouni where it is shown 
that uniformly in a > and h 

(1.10) 7 <7 (/3',/3") < — 2min ■yApgAgp, a.s., 

and the high signal-to-noise asymptotics are obtained: 

1 d 2 
-^^min [h( ai ) - h(aj)] 

2 — 1 

_^ d d 

2 N a ») _ K a i)] 
»=1 j=l 

where fi is the ergodic measure of X . 

The method in [3 (and its full development in |2j) does not rely on [23 and is 
based on the analysis of the Zakai equation, corresponding to (|1.4|l (sec (15. 2[) below). 
The analysis is carried out by means of the Hilbcrt projective metric and the Birkhoff 
inequality, etc.; see section|3for more details. This approach proved out its efficiency 
in several filtering scenarios (see pQ, )• 

Other results and methods related to the filtering stability can be found in 0|, 

DH, E2, [13, d], pi], mi, Q2], CHI, [23, El, E3, H3, E3, E3- The linear 

Kalman— Bucy case, being the most understood, is extensively treated by several 
authors: , [32], [33], [201, EH1 , ESI, EDI (sections 14.6 and 16.2). 

In the present paper, we consider both ergodic and non-ergodic signals. Applying 
the technique from Atar and Zeitouni, [2], we show that in the ergodic case the 
asymptotic stability holds true without any additional assumptions. In other words, 
the conclusion of H. Kunita [23 is valid in the specific case under consideration. 

In view of the counterexample given in section [3 it is clear that in general ja- 
may vanish at a = 0. So, it is interesting to find out which ergodic properties of the 
signal are inherited by the filter regardless of the specific observation structure. In 
this connection we prove the inequality 



lim a 7 CT < — ■ 

<T^0 



lim a 7 CT > — ■ 



lim - log ||7if " - 7r t "|| < - V fj, r mm X r 

t^oo t *■ — ' i-^r 

r— 1 



Since fi is the positive measure on S, unlike (|1.1U|) . this bound remains negative if at 
least one row of A has all nonzero entries. 
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Also we give the nonasymptotic bound (compare with (|1.1U|0 



< C exp ( — 2t min -J \ pq \ qp 

\ p=£q 



with some positive constant C depending on v and (3 only. 

For the discrete time case, related results can be found in Del Moral and Guionnet 
[T%j and Le Gland and Mevel [21] ■ For example, in the positiveness assumption 
for all transition probabilities is relaxed under certain constraints on the observation 
process noise density. 

In the case of nonergodic signal, m > 1, we show that the filtering stability holds 
true if the ergodic classes can be identified via observations and the filter matched to 
each class is stable. We formulate explicit sufficient identifiability conditions in terms 
of A and h. 

The paper is organized as follows. In section [21 we introduce the necessary no- 
tations and clarify the role of condition v <C f3 in the filtering stability (Proposition 
12. It . This section also gives a link to the gap in Kunita's proof [221, while in section 
01 the filtering setting is described for which the stability fails and the gap becomes 
evident. 

The main results are formulated in section 01 and proved in sections and El 
2. Preliminaries and connection to the gap in [23 . 
2.1. Notations. Throughout, v <C (3 is assumed. 

In order to explain our approach, let us consider a general setting when (A, Y) 
is Markov process with paths from the Skorokhod space D = D[o.oo)(R 2 ) of right con- 
tinuous functions having limits to the left functions. Moreover, the signal component 
A is Markov process itself. 

We introduce a measurable space (O, £)), where & = a{(x S7 y s ),s > 0} is the 
Borcl cr-algcbra on D. Let D = {S>t)t>a be the filtration of 3l t = a{(x s ,y s ), s < t} 
and let D y = {@?) t >o be the filtrationof 9\ = a{y s ,s < t}. 

As before, we write (X" , Y t v ) and (Af , Yf), when the distribution of Ao is v or (3 
respectively, meaning that both pairs are defined on the same probability space, have 
the same transition semigroup, but different initial distributions. 

For a bounded measurable function /, we introduce 7r^(/) := E(f(X")\3^ 

and vrf (/) := E(f(X^)\^ t] ). Since <(/) and vrf (/) are Wfa- and -measurable 

random variables respectively, it is convenient to identify Tt%(f) and 7rf (/) with some 
^-measurable functionals of trajectories Y^ t j = {F/, s < t} and Y^ f , = {Yf, s < t}. 

For this purpose, let Q v and Q (i denote the distributions of (X'\ Y u ) and (A* 9 , Y 13 ) 
on (D, &) respectively and Q v t and Qf be their restrictions on [0, t], so that Qq,Qq 
are the distributions of (Xq,Yq), (Xq,Yq). We also assume that 

<-) 

Since (X^,Y^) and (Af ,Yf) have the same transition law, we have Q v <C Q 13 with 

dQ v . . dv 

Without loss of generality, we assume that the filtrations D and D v satisfy the 
general conditions with respect to (Q u + Q°)/2. 
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For fixed t, let iff (y) be ^-measurable functional so that H^(Y^) = 7rf (/), a.s. 
Moreover, due to Q v <C Q 13 , a version of iff (y) can be chosen such that the random 
variable iff (Y v ) is well defined. Then, we identify n^(f) with H?(Y V ). 

We do not assume that f3 -C v (and thus Q@ <jt. Q^), so this construction fails for 
7iy (/). Nevertheless, a version of H^{y) can be chosen such that H^{Y U ) = TT^(f) 
a.s. and used for the definition of 7r^(/). Indeed, let an d Q" be the distributions 
of Y v and V 3 respectively, i.e., the marginal distributions of and Q v . Obviously, 
Q U <C cf as well as <C Qf; the restrictions of Q' and on the interval [0,t 
Moreover, ^{Y p ) = £($(^o )|^[o,t])- Now definc 



We introduce the decreasing filtration 3t>£ ^ = a{X^, s > t}, the tail a- algebra 



(2.2) ST(X P ) = P| 3C x 



13 

t,oo) ' 
t>0 



and a-algebras X? = a{Xf), 9* = Vt > . 
Set 

(2.3) 4"(f) = E(f(X^ t] V^) 



2.2. Filter stability. For bounded and measurable /, the estimate ir"{f) is 
asymptotically stable with respect to (3, if 



(2.4) lim £ <(/)-<"(/) =0. 

t — >OC 

Note that, when the signal process takes values in a finite alphabet and (|2.4|) holds 
for any bounded /, then l|2.4|) and l|1.6f> are equivalent. 

We establish below that l|2.4|l holds, if for large values of i the additional measure- 
ment Xq is useless for estimation of /(-X"f ) via y£ t , or, analogously, if the additional 

measurement A"f is useless for estimation of j^(x{l) via y£ s. 

Proposition 2.1. Assume u<ti(3. Then, any of the conditions 
1. 

(2.5) lim£|7rf(/)-7rf o (/)|=0, 
2. 

(2.6) *($(*f>K»,) " ,^ E (|< X »'I<»> V 
provides l|2.4|l . 

Proof. Let us first show that, under u <^ /3, for any bounded / 



(2.7) 

= £ 



%f (/)-<(/)) 



E 



G 
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Write 



E\4" (/) - <(f)\ = E^(Xg)\*?(f) nf(f)\ 



EE 



{%^)K^\4u)-^{f)\ 



E 



E 



dft 
/dv 



dv 



Vd/3 

So. it remains to show 



(^)|< t] )^(/(^)l*) " <i(*oV(/)«, t] ) 



(2.8) 



E 



,d/3 



dz^ 
d/3~ ( 



With ^f-measurable and bounded function ^t{y) we get 



dz^ 



(* t (y")<(/)) =s(* t (y)/(xn) = J B(* t (y )^(^)/(xf)), 



and notice that (|2.8() is valid by the arbitrariness of "JV 
The proof of l|2~5 ]) =^l|2T2 j) . Using and 



dz/ 
d/3~ ( 



we derive 



E\nt(f) - <(/)| = E\E&{Xg)\9g A )*?U) ~ K^oVH/) 



E 



E{^M{f)-^u))K 



<E^(xN^(f)-^(f)\, 



,dv 



where the Jensen inequality has been used. Let for definitencss \ f\ < K with some 
constant K. Then 7rf(/), 7rf°(/) can also be chosen such that (/)| and |7rf°(/)| 
are bounded by K. Hence, for any C > 0, we have 

E\4 v {f) - <(/)| < CE\4(f) - 7rf°(/)| + 2KpAxg) > c). 



Therefore, Hindoo E\i$ v (f) - < 2KP(^(X%) > C) and by the Chebyshev 

inequality P(^(X^) > C) < C" 1 -> 0, C -> oo. 

r/ie proo/ o/ l|£5 )) =^(|£i )l . By 

e\4 v (/)-<(/)! 



/(*f)|^)l<, 



Notice also 



K/(*f )|(^)i^) - ^/(*f v 



dz; 



d/3 



d/3 1 



1^/3 

lHo,t] 
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Since |/| < K, by the Jensen inequality we have 



(2.9) E\4» (/)-<(/)! 



< KE 



[0,t] 



Both random processes E(%(X^ A ) and E(^(X^ >oo) V ^ >oo) 



[t,oo) 



-0 



are uni- 
formly integrable forward and backward martingales with respect to the nitrations 
(^[ot])'^° an< ^ (®fo oo1 V "^foo~i) t >°- Therefore, they admit limits a.s. 



[0,. 



in f 



and mn£(^(^)|^ i0o) V respectively. By ^ 



lim 

i — >-oo 



[o,/; 



(iw-,v*i 



/3 

[t,oo; 



0. 



We show also that 



(2.10) 



lim E 

t—>oo 



Denote by a t any of £ |3[J t] ) and s(||(x£)|^ 



/3 w 

[0,oo) [*)°°] 



0. 



and 



lim at- 



It is clear that (|2.1()(l holds true, if \im t ^oo E\a t — a^l = 0. Since lim a t = ctoo, 

t — >oc 

a t > and i?a t = Ea^ = 1, by the Scheffe theorem we get the desired property. 
Thus the right hand side of (|2.9|) converges to zero and the result follows. □ 

2.3. Connection to the gap in [25]. In H. Kunita studies 1 ergodic prop- 
erties of the filtering process ir" . He considers it" as a Markov process with values in 
the space of probability measures and claims (in Theorem 3.3) that there exists the 
unique invariant measure being "limit point" of marginal distributions of n%, t /* oo. 
As was later shown in this result is the key to the stability analysis under (|1.8J) . 

Below we demonstrate that the main argument, used in the proof of Theorem 3.3 
of [23| , cannot be taken for granted. We discuss this issue in the context of Proposition 
12.11 Suppose the Markov process X is ergodic in the sense of <|1.7f) and (|1.8|) . It is well 
known that its tail er-algebra ^f(X^) (see (|2.2|) for definition) is empty almost surely. 
It is very tempting in this case to change the order of intersection and supremum as 
follows: 



(2.11) 



V X 



£,oo) 



[g^V *(X»), a.s. 



t>o 



Then, the right-hand side of (|2.6|) is transformed to 



dv 



i>0 



x The notations of this paper are used here. 
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and (|2.6|) would be correct, regardless (!) of any other ingredients of the problem 
(e.g., with a = in ifTTT))). 

In [231 > the relation of i|2.11[) type plays the key role in verification of the unique- 
ness for the invariant measure corresponding to ~nX-,t > 0. However, the validity of 
(12.11(1 is far from being obvious. According to Williams 01], it "...tripped up even 
Kolmogorov and Wiener" (see Y. Sinai (23 p. 837] for some details). The reader can 
find a discussion concerning (|2.11|) in Weizsacker unfortunately, the counterex- 
ample there is incorrect. A proper counterexample to l|2.11|l is given in Exercise 4.12 
in Williams which, however, seems somewhat artificial in the filtering context. 
It turns out that the example, considered by Delyon and Zeitouni in [20] (see [22] by 
Kaijser for its earlier discrete-time version), is nothing but another case when l|2.11[) 
fails. 

For the reader convenience, we give below a detailed analysis of this example. 

It is important to note that the counterexamples mentioned above do not fit 
exactly to the setup, considered by Kunita. They merely indicate that (|2.11|l is not 
evident and so the claim of Theorem 3.3 in |23j remains a conjecture. 

Generally, the stability of nonlinear filters for ergodic Markov processes remains 
an open problem, and some results [21], 001 , EH EH based on [22] have 
to be revised. 

3. Counterexample. Below we give a detailed discussion of one counterexample 
to (|2.11() . Consider Markov process X with values in § = {1,2,3,4}, with the initial 
distribution v and the transition intensities matrix 



(3.1) A: 



(-1 1 \ 
0-110 

0-11 

\ 1 -1/ 



All states of A communicate and so, X is, ergodic Markov process (see e.g., |34p 
with the unique invariant measure fx = (1/4 1/4 1/4 1/4) . Let h(x) = I(x = 
1) +I(x = 3), that is, 

Y t = I [I(X. =2)+I(X„ =3)]ds + aW t . 
Jo 

By Theorem 14 . 1 1 below . the filter is stable in this case for any a > 0. 

3.1. Noiseless observation. Consider the case a = 0. 

It will be convenient to redefine the observation process as follows: 

Y t = [I(X t = 1) + I(X t = 3)]. 

We assume v -C (3 and notice that l|2.1() holds true. We omit the superscripts v and j3, 
when the initial condition does not play a significant role. Since X is ergodic Markov 
process, satisfying (|1.8|) . S?{X) = (f2, 0), a.s. 
Proposition 3.1. 

,oo)j a - s - 

t>0 
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Proof. It suffices to show that Xq is Plt>o f^fo,oo) V <%~[t.oo)j -measurable random 
variable and at the same time Xq ^ ^fo,oo)- 

The structure of matrix A admits only cyclic transitions in the following order 

•••-{3}-{4}-{l}-{2}^{3}-... 

So, since Y and X jump simultaneously, Xq can be recovered exactly from the tra- 
jectory Y s ,s < t and X t for any t > 0, i.e., Xq is 3>t V $fo, immeasurable. Owing to 
%t V $fo,t] C ^ft.oo) v ^fo,oo)i -^o is measurable with respect to 



n v 

t>0 



Denote by (ri)i>i the time moments where Y jumps. It is not hard to check that 
(t»)»>o is independent of (Xq, Yq) and moreover 



(3.3) 



®[o,t] = \J <?{n<t}\/a{Y }. 

Thus for any t > 

P( X = = P\X Q = 1\\J a{n <<}V a{Y } 



P(X = 1|Y ) = -^^Y 



Since (|3.3|) is valid for any t > 0, we conclude that 

P(X = l|^fo,oo)) = — 

Obviously I(Xq = 1) ^ ^ Y) and thus Xo is not $fo,oo) -measurable. □ 

3.2. Invariant measures of 7T( and the filter instability. Since It (2) + 
7 t (4) = 1 - Y t and J t (l) + J t (3) = Y t , only J t (l) and I t (2) have to be filtered while 
7r t (3) = Yt — 7r t (l) and7Tt(4) = (1— Y) — 7Tt(2). The derivation of the filtering equations 
is sketched in the appendix. 

Proposition 3.2. TTie optimal filtering estimate satisfies 

dn t (l) = (l-7r t _(2))(l- Yt-)dYt+Tr t -{l)Yt-dYt, 

d-K t {2) = -7T t _(2)(l - Y t _)rfY t - 7T t _(l)Y t _dY t 

suftjec* to 7T (1) = ^Y , 7T (2) = ^-(1 - Y ). 

Let us examine the behavior of the filter from Proposition 13. 21 A pair of typical 
trajectories are given in Table l3~Tl ffor Yq = 1) an d Table I5~2l ffor Yq = 0). 

It is not hard to see that Y is itself Markov chain with values in {0, 1} and the tran- 
sition intensities matrix ( ~ t } 1 ) and thus its invariant measure is p! = (1/2 1/2). 
Hence, the invariant measure $ of the filtering process ( 71^(1), 7Tt(2)) is concentrated 
on eight vectors 

\ , , / 



.9 V (h-7 = ( Vl+Vi | . (ho = ( ^ 



/ \ u,4.», / \ U 
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Table 3.1 
Typical trajectory of wt for Yq = 1 . 



t 


[0,n) 


[n > t"2 ) 


[T2,r3) 


[f"3,T"4) 


[T4,T5) 




Y t 


1 





1 





1 






"\ 





^3 











171 + 173 




Ti(2) 





171 
+ 173 





"a 

V 1 + ^3 








Table 3.2 
Typical trajectory of nt for Yq = 0- 



t 


[0,n) 


[Tl , T2 ) 


[f"2,T3) 


[f"3, T4) 


[t"4,tb) 




Y t 





1 





1 







T*(l) 





172 
173 + 174 





17 4 
179+^4 







Ti(2) 


^2 










^2 




17'?. + 174 


I7. ? + 174 


179 + 174 



with 

= {vx + v 3 )/A, i= 1,2,3,4, 
= (i* + i>4)/4, i = 5,6,7,8, 

and, consequently, <E> is not unique. Moreover, the optimal filter is not stable in the 
sense <|1.6ll . In fact, for different initial conditions, the filtering distribution ir t ,t > 
can "sit" on different vectors! 

4. Main results. 

4.1. Ergodic case. Markov chain X is crgodic, if and only if all entries of its 
transition intensities matrix A communicate, i.e., for any pair of indices i and j, a 
string of indices {£ i, . . . , £ m } can be found so that Xu 1 Xe 1 e 2 . . . Xi m j ^ (sec, e.g., 
In this case, the distribution of X t converges to the positive invariant distribution \i 
being the unique solution of A*/x = in the class of vectors with positive entries the 
sum of which is equal to one. 

Theorem 4.1. // all states of A communicate, then there exists a positive con- 
stant c such for any v and [3 

lim i log ||7rf ^ — tt^II < -c, a.s. 

t— too t 

Remark 1. Clearly, Theorem \4-l\ provides (ll.6fl . Also it allows to conclude 
that lim \\nf v — tt^\\ = 0, a.s. for (3 concentrated in a single state of 8. Then, in 

t — too 

particular, we have 

lim 117^-7^11=0 

t^oo 

which is the main argument in the proof of existence of the unique invariant measure 
for the process (TTt)t>o- This fact corroborates Kunita's result from |23| in the finite 
state space setup of Theorem \4- 1\ 

Actually, Theorem 14. II verifies the logarithmic rate in t — ► oo which is in general 
a function of A, h and a. However stronger assumptions on A guarantee exponential 
or logarithmic rates, regardless of h and a [a is only required to be nonzero). 
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Theorem 4.2. Assume all states of A communicate. Then 

- 1 " 

(4.1) lira - log — vri' II < —S^fj, r mm\ r i. 

r— 1 

Remark 2. TTie bound (|4.1|l is negative if at least one row of A has all nonzero 
entries. 

Theorem 4.3. Assume all entries of A are nonzero. 
1. If ' v < p\ tten 

(4.2) £7||7r£ " - 7r t l < «X) exp ( - 2tv T V X p* X ip) > t>0 - 



3=1 



2. 7/i/ ~ (3, then 



(4.3) ||7rf " — 7T+II < n 2 max —(a,-) max — — (a^) exp ( — 2t min */ ApgA™) , f > 0. 

i ap j ai/ V p#<? / 



4.2. Nonergodic case. Let to > 2 and A be given in (|1.3|) . If Ao G §j, then X 
is Markov process with values in Sj with transition intensities matrix A_, . We denote 
this process by X° . In addition to h, introduce column vectors hj, j = 1, . . . , m with 
entries /i(aj), . . . , h{a? n .) respectively. 

Theorem 4.4. Assume the following. 
A-l. For any j , all states of Aj communicate. 
A-2. For each j, k with j ^ k either 

or 

h* diag(/^)Aj/ij ^ h* k diag(/x fe )A^/ife, for some < q < nj + rik — 1- 

Then the asymptotic stability (|1.6f> holds true. 

The condition IjA-lfl is inherited from Theorem l4.1l to ensure the stability within 
each ergodic class, while under (|A-2|I ^fo.oo) completely identifies the class in which 
X actually resides. 

5. Proofs for the ergodic case. Recall that under m = 1, X is a homogeneous 
ergodic Markov chain with values in the finite alphabet § = {cti, . . . , a„} with the 
transition intensities matrix A. The unique invariant measure fi = (pi, . . . , fi n ) is 
the positive distribution on S. Let v be the distribution of Xq and /3 a probability 
measure on S. The observation process Y is defined in Ijl.lfl • Recall that the entries of 
it" and 7rf " are the true and "wrong" conditional probabilities respectively as defined 
in the introduction. 

5.1. The proof of Theorem 14. U We use the method proposed by Atar and 
Zcitouni in [2], which is elaborated for the considered filtering setup for reader con- 
venience. 

Recall the following facts from the theory of nonncgative matrices. For a pair 
(p, q) of nonncgative measures on § (i.e., vectors with nonncgative entries), the Hilbert 
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projective metric H(p,q) is defined as the following (see, e.g., [HE|) : 




max (pj/qj) 

p~ q, 



(5.1) H(p,q) = { 1W& t £*f Cw/*) ' 

p >/> q. 

The Hilbert metric is known to satisfy the following properties: 

1. H(c\p, C2<?) = H (p, q) for any positive constants c\ and C2. 

2. for matrix A with nonnegative entries (Aij), 

H(Ap, Aq) < r(A)H(p, q) (see, e.g., 



where t(A) = 1 V^A^_ j g Birkhoff contraction coefficient with 

1 1 a\ A^Ajg 
ip(A) = mm — — y~. 

3 - HP - ell < K§3 H (P' (0 Lemma 1]). 
Returning to the filtering problem, let us first consider the special case when 
v = /i and thus the signal is the stationary Markov chain. It is well known that 
7Tj' = r?j /(l, r)t), where 1 denotes the vector with unit entries, (-, ■) is the usual inner 
product and rj^ solves the Zakai equation 

(5.2) drtf = h*rftdt + o^ 2 diag(%fdy/" 

subject to 7/q = p. Similarly, -k^ = r/f ?/f M ), where -q^ is the solution of (|5.2J) 
subject to t]q = /3. 

The Zakai equation possesses the unique strong solution which is linear with 
respect to the initial condition. Hence, rft = J[o,t]H and ^f^ 1 = Jjo^i/3, t > 0, where 
J[o.t] is the random Cauchy matrix corresponding to l|5.2|l . 

The matrix «/[o,t] can be factored (here [t\ is the integer part of t): 

/L*J \ 

J[0,4] = ^[L*J,t] I II J [™-1,«] I J [0,l]' 

The properties of Hilbert metric, listed above, provide 
2 W 

- foi3 r ^ j[Ltj '* ] ' ) n T ( j [«- i '"]) jff ( j [°. 1 ]^' j [o,i]^)- 

Assume for a moment that H ( J[ 0jl ]/x, J[o,i]/3) < oo a.s. Then 

— 1 — 1 L * J 

(5.3) lim - log - 7rf p || < lim — V logr(j [ „_ 1 ,„ ] ) 
t— >oo t 11 11 t—>oo t * — ' 

L J n=2 

1 

^ t 1 ™ |7| Z, { - 1 V lo S r ( J [n-i,n])} =E[-1V logr(J [0 , 1] )] < 0. 
L J „_o 
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The equality is implied by the law of large numbers, which is valid since — 1 < 
{-IV logr(j [n _ 1 „])} < and logr( J[ n -i, n ]) is generated by 

{X? - X%_ ± , W s - W n -i}, n-l<s<n, 

where the processes and W are independent and X* 1 is an ergodic Markov chain. 
Let JY n _-y „i be the matrices defined similarly to J[ n -i, n ] with Y" M replaced by Y v . 

Recall that [i is the positive measure on S, so that v <C \i and, in turn, Q u <C 
(here Q M is the distribution of y M ). 

Since (|5.r?|l holds Q M -a.s., it also holds Q^-a.s., i.e., with J[ n -i. n ] replaced by 

J [n-l,n] which § ives 

Theorem 5.1. (version of Theorem 1(a) in Atar and Zeitouni, [2]) Assume that 
all states of A communicate, i.e., X is an ergodic Markov chain. Assume J[o,i]/3 and 
J[o,i\ 1/ have positive entries a.s. Then, 

(5.4) lim" iloglK-Trf" || < S[ - 1 V logr( J [0>1] )] , a.s.. 

Now the statement of Theorem 14. II follows from the lemma below. 
Lemma 5.2. The right-hand side of <|5.4[) is strictly negative. 
Proof. It suffices to show that all entries of J[o,i] are positive a.s. For fixed 
we have 

J[o,t](i,j) = % + / J[o,s]{hj)[Ktds + a^ 2 h{a t )dY^] + / V A„ J [0>s] (r, j)ds. 

With the help of Ito formula and with 

=cxp{A M t + a- 2 / l (a l )r^ - (l/2)<7- 2 ft 2 (ai)t} 

we derive 

J[o,t](JJ) = M)( 1 + / ^^jOy] VjJ[o )8 ]('", J')* 

(5-5) t 

J[o,t](hj) = <t>t(i) s 1 (i)y2K l J[o.s](r,j)ds, i^j. 
Jo , 

Also notice that the entries of J[o,t] are unnormalized conditional probabilities and so 
nonnegative a.s. Since all states of A communicate, for pair of indices there is 
a string of indexes j = ig, . . . , i\ = i such that Xi t i e _ 1 . . . Xi 2 i 1 > 0. So from (|5.5|) . it 
follows that a.s. 

J[o,t]{h,h) > <j>t{h) > 0, 

J[o,t](it-i,ii)><f>t(it-i) / <Ps 1 ( i e-i)^iiit-iJ[o,s](ie,i£)ds > 0, 
Jo 

J[o,t]{h-2,u) > 4>t{h-2) I <t>s 1 ( i e-2)^ie-ite-2J[o.s](ie-i,ie)ds > 
Jo 

for any t > 0, and so on until we get J[o,t](*i, u) > 0, £ > 0. 
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5.2. The proof of Theorem IP1 Denote Pji (t) = P(X^ = aj \^ t] ,X^ 
If (3 is a positive distribution, then by Lemma 9.5 in j^Hl Chap. 9] we have 



(5.6) 



i, i = i 
0, jV» 



Remark 3. Bi/ i/ie arguments, used in the proof of Lemma \5.Sl it can be readily 
shown that 7rf (i) > a.s., i = 1, . . . , n /or any £ > 0. Then (|5.6(l remains valid for 
t > to for any to > initialized by 

Pji (t ) = P(*£ - ai |0fj to]> x£ = a,). 

Set z°(t) = argmax i6 g pji(t) and «<>(£) = argminj g § p^j (i) (if the maximum or the 
minimum are attained at several indices, the lowest one is taken by convention). Set 

(5.7) p°(t) := Pji*(t){t) and Po {t) p jlo{t] (t). 

Lemma 5.3. The processes p°{t) and p<>(t) have absolutely continuous paths with 

n 

dp«(t) = J2l(i°(t)=i)Pji(t)dt, 

(5-8) i : 1 

dpo{t) = y]l(i<>(t) = i)pji(t)dt. 
t=i 

The proof of this lemma uses two results formulated in Propositions 15.41 and 15.51 
below. 

Proposition 5.4. (Theorem A.6.3 in Dupuis and Ellis [2JD- Let g = g(t) be an 
absolutely continuous function mapping of [0, 1] into M. Then for each real number a 
the set {t : g(t) = a,g{i) ^ 0} has Lebesgue measure 0. 

Proposition 5.5. Let X(t,Lo) be a random process with absolutely continuous 
paths with respect to dt in the sense that there exists a measurable random process 
x{t,uS) such that J Q \x(s, uS)\ds < oo a.s., t > 0, and 

(5.9) X(t,w)=X(0,w)+ [ x(s,oj)ds. 

Then 



\X(t,uj)\ = \X(0,Lo)\+ f ri e p.(X(8,w))x(8,w)ds, 

JQ 

where sign(0) = 0. 

Proof. Set Vt(u>) = J Q \x(s,u>)\ds and notice that for any t 1 < t" it holds that 
||*(tV)| - \Xtf,u>)\\ < \X{t",w) - X(t',oj)\ < (Vt»(u>) - V v {u)). 
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Hence, for fixed u>, the function \X(t, uj)\ possesses bounded total variation for any 
finite time interval. Denote U t {uj) this total variation corresponding to [0,t]. Obvi- 
ously, dU t {io) < dV t {uj) < dt. Recall that U t {u>) = U' t {uj) + U' t '{uj), where U' t {u), 
U"(u) are increasing continuous in t functions such that for any t > and measur- 
able set A from M + , J An ^ t i dU' s '(ui) = and f, M ww t ] dU' s (w) — 0, and at the same 
time, \X(t,u)\ = U' t '{w) - uf(u). Since dU' t < dU t {uj), dU' t ' < dU t (u)), it follows 
< dJ7 t (a;) < dV t (w) < dt and so that 

(5.10) \X(t,u)\ = \X(p,u)\+ f g(s,uj)ds 

Jo 

though we may not claim that g(t,u>) is measurable in (t,u>). 

Now, we show that sign(X(s, cj))x(s, uS) is a measurable version of g{s,oj). By 
H5.9(l . we have X 2 (t,oj) = A" 2 (0,o>) + 2 J" X(s, w)a;(s, w)ds. At the same time by 
lj5~TH|l it holds \X(t,cj)\ 2 = \X(0,uj)\ 2 + 2f*\X(s,uj)\g(s,uj)ds. Hence, the following 
identity is valid: for any t > 

t 

|X(s, w)|g(s, uj)ds = / X(s, w)x(s, w)ds. 
o Jo 

Therefore, |X(s,w)|g(s, ui) — X(s,uj)x(s,uj) for almost all s with respect to Lcbcsgue 
measure. Consequently, we have I(\X(s,u))\ ^ 0)g(s,u) = sign(X(s, u>))x(s, w) for 
almost all s with respect to Lebesgue measure. It remains to show that 

I(X(s,u) = 0)g(s,Lu) = 

for almost all s with respect to Lebesgue measure. Taking into account (|5.10f) . it 
suffices to prove that J °° I(X(s,(j) — 0)d\X(s, w)\ = 0, a.s. On the other hand, 
whereas d\X(t,ui)\ <C dVt(cj), it suffices to show that J °° I(X(s,u) = 0)dV s (cj) = 0, 
a.s. The latter holds by Proposition l5.4l □ 

Now we give the proof for Lemma 15.31 

Proof. Let us introduce p°' l {t) = pji V P j2 V- • -Vpji and p^,i(t) = pji /\Pj2 A- ■ ■ Apji 
and notice that p°' n (t) = p°(t), p<,,n(t) = Po{t)- 
The use of obvious identities 

P^ 2 (t)+ P<> . 2 (t) = Pjl (t)+ Pj2 (t), 

P»> 2 (t)-p <> ,2(t) = \p jl (t)-p j2 (t)\ 

and the fact, provided by Proposition 15.51 that d\pj\(t) — pj%{t)\ = p(t,ui)dt with 
measurable derivative p(ui, t), allow us to claim that /9°' 2 (t) and Po,2{t) are absolutely 
continuous with respect to dt with measurable derivatives. 

Further, taking into account p° ,l (t) = p <> ' l ~ 1 (t)V P ji and Po,%{t) = Po,i-i{t) A Pji(t) 
and consequent identities 

p o,i {t ) + p *,i-i (t) A p .. {t) = p o,i-l {t) + p .. {t) 

p o,i(t) - p ^-\t) A Pji (t) = \p^-\t) - Pji (t)\ 

Po,i-l(t) V Pji(t) + P o,i(t) = P oa-l(t) + Pji(t) 
Po,i-l(t) V P ji(t) - P o,i(t) = \ P o,i-l(t) - Pji(t)\ 

absolute continuity for p°(t) and p<>(t) is verified by the induction method. 
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Thus, dp°(t) = u(t)dt with some density u(t) such that J Q * |u(s)|ds < oo a.s., 
t > 0. On the other hand, since Y^7=i -^fafa = = 1) we have 

P «{t)=p«{<S)+ / 

J i=l 

So, it suffices to show that for any t > and any i = 1, 2 . . . ,n 

[ /(»*(«) = i)Ks) - fti(s)|ds = 0, a.s. 
Jo 

The latter holds true by Proposition ^. 41 since 

/ I(f{s)=i)\u{s)-p ji {s)\ds 
Jo 

= / liP^ia) - Pji(s) = 0)\u(s)- Pji(s)\da 
Jo 

= [ J(p»(a) - = 0,«(a) - ^ 0)|t*(a) - p Tl (s)\ds = 0. 

Jo 

□ 

Lemma 5.6. Under the assumptions of Theorem \4.2[ 



I n 

(5.11) lim - log max \pjk{t) — Pje{t)\ < — > /i r minA 

t— >oo t l<j,fc.£<n 1 1 * — ' i^r 

Proof. By l|5.6|l and (|5.8|l . we have 2 



(5 12) 

In what follows, we will omit the time variable in i (i) and i°(£) for brevity. 
Set A t = p°(t) - p<>(t). By iflTHSj) we have 

dA t \- A rt o7rf (r) , , jwrf (r) , , 



(5.13) = -A 



A, 



A» io7rf fa) Ajo^Trf fa) \ 

, ^ffa) ^ffa) ; 

r \ n oTT^(r) / p°(t)~ Pjr (t) \ Ki^t{ r ) ( Pjrjt)- Po{t) 



2 In <5T2l .. . . we use for brevity a form of differential equalities (inequalities) which are 

valid for any w and almost all t with respect to Lebesgue measure. 
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< a r (t) < 1 and (|5.13[1 implies 



Letting 0/0 = 1/2, set a r (t) = p a^£irw , Then, we get 1 - a r (t) = P3r(t ^~ Po(f) and 



dAt 
dt 



7rf(i<>) Trf^o) 



(5.14) 



A, 



L "t (**) ^ 



r^»(t) 
Wi©(t) 



V 4o 7rf(r) 
(io) 



/ 



< - A t ^A ioi o7rf (i<>) + Vio^f 
/ 



A, 



^ Ja r (i)A ri o + (1 - a r (t))A r i 7rf (r) 



r^i<>(t) 



<- A* 



>*£(io) + w£(0+ E 



A r i« A A r i 



V 



Recall that all offdiagonal entries of A are nonnegative and ^"=i ^ir = f° r an y *• 
Then, |A loi o| A |A M I > A iol o, |Aj«jo| A \\ l ° l A > X^i^, and (|5.14|) provides 



dA 
dt 



< - A t J2 (|Arf | A |A„j)^f(r) < -A t £ nunJArtbrf (r) 



= — Af 7r^ (r) min A r j- 

r=l 

Since the derivative is defined for each uj and almost everywhere (a.e.) in t with 
respect to dt, the above inequality < — A t 7rf (r) minj^ r Ah is also valid a.e. 

So, it allows us to define a.e. the function 



H(t) = -A t V Trf (r) min A r 

r=l 



eft 



Moreover, for the definiteness, we may redefine H(t) everywhere so as H(t) > 0. Then 
we have 



n 

dA t = - \A t V 7rf (r) min A„ + ff(t) 



r=l 



Notice also that J Q \H(s)\ds < oo, a.s. for any t > and recall that Ao = 1. Then, 
we get 



A, 



cxp ^— J 7rf (r) min \ r ids^J — J cxp J E/ ( r ) n ^ n ^ridv^J H(s)ds 



18 



P. BAXENDALE, P. CHIGANSKY, R. LIPTSER 



and in turn 

1 n 1 /"* 

- log A t < - V ( min A„ ) - / 7rf (r)ds. 

* ^i V ' 1 Jo 

So, it is left to verify that 

1 [* 

(5.15) lim - / 7rf (r)ds = /i r , a.s. 
Similarly to l|1.4f> . 7rf satisfies 

dTrf =A*n?dt + cr- 2 (diag(7rf) - irf {irf)*)h(dYf ~ h*^dt). 

Recall that a~ 1 (Yf — j^h*-K^ds) is the innovation Wiener process (see, e.g., The- 
orem 9.1 in Chapter 10 in [3U]). Hence M t = f* (diag(vrf) - 7rf (vif )*)h(dYf - 
h*n^ds) is vector- valued continuous martingale. Its entries M t (i), i = l,...,n, 
have predictable quadratic variation processes (M(i)) t with the following property: 
for some positive constant c, d(M(i)) t < cdt. Then by Theorem 10 in Chapter 3 
in [31], lim^oo jM t (i) = 0, a.s. This fact and the boundedness of 7rf provide 
A* linii—.oo j f Q n^ds = 0. The vector Z t — j f Q n^ds has nonnegative entries, whose 
sum equals 1. Therefore the limit vector Z^, obeying the same property, is the unique 
solution of the linear algebraic equation A*Z OQ = 0, i.e., Z^ = [i. □ 

To prove Theorem 14. 21 without loss generality, due to Remark we may assume 
that v ~ p. Then, we show that for any t > and i = 1, . . . , n 

(5.16) \ir%(i) - < nmax^(a i )max^(a J ) max \p Ti (t) - p jk (t)\. 

J dp J dV l<i,j,k<d 

Recall that Q v and Q 13 are distributions of {X v ,Y U ) and (X^,Y^) respectively, 
which are equivalent, by virtue of v ~ 0, with 

Now, we show that for any i = 1, . . . , d and t > 0, Q 1 '- and Q^-a.s. 

E;=i (f(a 3 -)P(^ = aj,X^ = 



(5.17) *£"(*) = 



To this end, with any bounded ^f-measurable function ipt(y), write 
EW)*?" (i)E(^-(XZ)\^ t] ) = EMYlnt^m 
= EMYn4 V (^(X V ,Yn = Ei> t (YP)4(i) 
= EMY P )I(X? = a,) = EMY-Wr = a i )^(X",y) 
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Hence, by the arbitrariness of ipt{y)i 

^m(fm\^ t] ) = e(i(x» = ai )f(x^ t] ). 

Further, Q v ~ Q 13 provides E(i§(X%)\&Z t] ) > 0, Q v - and Q^-a.s., so that 

n w = 

and it remains to notice that 



_E{l(X? = a i )£(XS)\9$ >t] ) 



E 



dp. 



E(I(X» = a^Ml^o,*]) = ^M P ( X t = ^ X o = aj\9$, t] ) 



3 = 1 



Taking into consideration (|5.17|) . we find 





[irZ(i)P(XZ = 






E 


!fW)l^ t] ) 





Then, since by the Jensen inequality l/E(jfc {X%)\<&$^ < #(||(^o)l^[o,t])> we get 
the chain of estimates 

i v / \ Qv / \ i dj3 j . dv j * 

KW-Trf W|< m| K-(a,)max-(a,) 

3 = 1 

d/3 . . dv 
< max — — (a,-) max — a,- 



(5.18) 



x P(X% = aj \Vfa) - P(X» = aj \X» = a^ t] ) 

d^ 1 



d/3 , , 

< max — aj max — (a,- ) 
~ a,es d^ J jgs d/3 Jy 



x £ \p(Xg = aj \&$, t] ) P{X V Q = aj \X? = a h ^ t] ) 

3=1 

= d^ } ;! ,ax ^ ; "' ! ^ n x Z = ^^m) ~ ^ w 

The obvious formula P(X» = a \^ t] ) = Y2=i < (k)Pjfc(*)> and C3B> provide 



n n 



Oj-eS a,-eS dp 



(5.19) 



a, es a,-eS an z — ' z — ' 

H i=ik=i 



dv 



3=1 k=l 
n n 



20 



P. BAXENDALE, P. CHIGANSKY, R. LIPTSER 



and i|5.16[) . Thus, by Lemma f5. 61 the desired statement (|4.1|> holds true. 
5.3. The proof of Theorem 14.31 We start with the following lemma. 
LEMMA 5.7. Under the assumptions of Theorem \A.'i\ for any t > 

(5.20) max \p k{t) - Pje{t)\ < exp ( - 2£min ^A pg A gp ), 

l<j,k,£<n \ Pt^Q / 

Proof. Here we follow the notations from Lemma 1531 From (|5.14|) . it follows that 

(5.21) < - At ( Ki *f t[i<>) + Xi ^f^\ 

subject to Aq = 1. Set r = inf{i : i°(t) = i (t)}. Since At is nonincreasing function, 
At = for t > t, and Q5.20(l holds trivially. For t < r, as previously we find 

" y \ Jo \ *£(*♦) Trf(^) / J 

< exp < - / min ( \ ioi °x + A M<> - ) ds > 



2 \/Xi^^Z ds \ < ex P ( - 2i min V^WV) > 



= exp 

and (f5~2T)j) follows. □ 

To prove the first statement of the theorem, taking into account v -C /3 we 
replicate a fragment from the proof of Proposition 12. II 

Using the notations introduced in section |2~T1 write := 7r^(/) and Trf" (i) := 

Tif "(/) for /(a) - I(i = Oi). Then, 

(5.22) (i) - <(i)| < £;|^(|(^)|^ >t] ) - ^(^(^)|^ >oo) V JT^) 

and, since is a Markov process, 

Then, 



n n j 

(5 23) = EE W = <*)ijs(<h)(P{x8 = a ^o,t]) ~ P*(*)) 

n n n , 

= EEE 7 ^ = ^fo) to*) - mo) 
j=i «=i fc=i " 

n ^ 
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The first statement of Theorem 14.31 follows from (|5.22|> , i|5.23[) , and Lemma 15.71 
The second statement follows from (|5.16|) and Lemma f5. 71 

6. Proofs for non-ergodic case. Recall that in the non-ergodic setting under 
consideration 



\ a{,...,al 1 ,...,a?,...,<£ }, m > 



with subalphabets Si, . . . ,S m noncommunicating in the sense of H.2fl . 

6.1. Auxiliary lemmas. In this subsection, X\ is an independent copy of X{ 
with the initial distribution defined on some auxiliary probability space (f2, J^", P) 
and E is the expectation with respect to P. Recall that p? is the invariant measure, 
so that X( is stationary process. 

Then with n — > oo 



Lemma 6.1. Fixr>{) and define Z n = Y%=x ( Y £ - Y u-i) r 



3=1 



h{Xj)ds 



Proof. Define 



(i) = E[(f h{Xl)ds)' 



X' 



and <S n = a{Y [0>nr] } V a{X [0 , nr] }. Then E[[Y^ 1)r - F r f r ) %-\ = r + F(X% r ) so 

that the sequence M n = Z n — nr — J^i^o ^i-^ir) * s a martingale with respect to the 
filtration (^ n ) n >i- It is easy to verify that there exists K < oo such that for all n we 
have E{M n+ i — M n ) 2 < K. It follows that (l/n)M„ — > almost surely as n — > oo 
(see, e.g., Chapter VII, Section 5, Theorem 4 in 02). 

Now consider (1/n) YZ^o F ( X ir)- If X o 6 §j, th cn X t e S,- for all i > and the 
process is ergodic in §j with stationary distribution (jp . Applying the ergodic theorem 
for each class E>j we obtain 

71— 1 rn m , „ r \ 2 

-^F(Xf r )^^^(X ))/(XoG§ J -) = E^( / Hxi)ds) /(jtfes,-) 

U l= j=l 3 = 1 KJ ° J 

as n — > oo a.s. Finally 



lim — Z n 

n — >oo fi 



and we are done. □ 



lim — M n 

n — >oc TL 



3=1 



lim 

n—>oo n 



n—L 

h(xi)ds) ;(^6Sj) 



With X| defined as in Lemma IO and r > let dj(r) = &( So K x i) d 
Lemma 6.2. For any k ^ j the following are equivalent: 
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i. df~(r) = dj(r) for all r > 0; 

ii. h* k diag(/^ / r c )A^,/ij. = h* Amg{p,j)K q hj for all < q < rij + iij — 1. 
Proof. Notice first that 

r-r r-s r-r r-s 

d 3 (r) = 2E / h(Xl)h(Xi)duds = 2 / Eh{Xi)h{X j s ) duds 
Jo Jo Jo Jo 

= 2 f [ Eh(Xl)h(Xl_ u )duds = 2 ( f Eh{Xi)h{Xl)dvds. 
Jo Jo Jo Jo 

Now, introduce the vector I\ with entries l(X-l = a{), . . . ,I(Xl = a 3 n .) and notice 
also that 



Eh{X>)h(X>) = EhMltYhi = Eh*P (P )*e A ^h 



= /i*Sdiag(/^)e A ^/i i = h* dmg{^)e A ' v h r 

Therefore d 3 (r) = 2 J Q r f* h* dia,g{^)e^ v h dvds so, dj(0) = d/(0) = and 

d/'{r) = 2/i*diag(^)e A ^/ij. 

Differentiating with respect to r a further q times and then putting r = we get 

df +q) (0) = 2h* ,liagf//'iA]// ; . 

It follows immediately that if dfc(r) = dj(r) for all r > 0, then 

hi diag( M fe )A^ fc = h* diag(/x J ')A^ j 

for all q > and so in particular for all < q < rife + rij — 1 . 

Suppose conversely that h* diag(/j J )A*/ij = /i£ diag(// fe )A|fofe for all < q < 
rife +rij — l. The Cayley— Hamilton theorem applied to the (rife +rij) x (rife + rij) block 

diagonal matrix y^Q ^ gives constants Co, ci, . . . , c„ fc + nj _i so that 

Uk+nj — 1 rifc+rij — 1 

g=0 <?=0 

Therefore we have /i£ diag(/i )A|/ifc = h* diag(/i J )A|ft,j for all q > rij + rife — 1 as well. 

Using the fact that e Aj ' r = X)^Lo ~~^r"> we see * na * ^fc"( r ) = d/'(r) for all r > 0, and 
hence dk(r) = dj(r) for all r > 0. □ 

Lemma 6.3. Assume (A-2). For anj/ /3 



lim E 

t — >oo 



P{X* z&A&l^) - I(X$ zSj) 



o, j > 1. 



Proof. We use the notation Z„ to express the dependence on r of the function 
Z n in Lemma IO We have ±Yg -> h j» jl ( X o e § i) and 

3=1 

as n — > co, a.s. Using the assumption (A-2) and Lemma ffi. 21 we can find an integer 
I and numbers > 0,i = 1,...,£ and construct a random variable of the form 
V n = (Y£, Zk ri) -n n , Z { n ri) -nr e ) so that \V n -> ^™ =1 vjlptf £ %) as n -> oo, 
P-a.s, where the vi,...,v m are distinct vectors in R £+1 . Therefore {Xq € §j} is 
Y,Q ^-measurable a.s. and the result follows immediately. □ 



STABILITY OF THE WONHAM FILTER 



23 



6.2. The proof of Theorem 14.41 By Proposition ^. II it suffices to show that 

J -Po I 



lim E 7rf — 7T( 

t — >oo 1 1 



0. 



We introduce a new filter, intermediate between 7rf and tt^" . Define the random 
variable U by U = j on the set {Xq <G Sj}, and then define 



J3o 



n?> U (i) = P(X?= ai \^ tt] ,U). 



Then 



= E 



m 

]T p(xf = ai \^ t] ,u = j) (P{U = j\^ t] ) I(U = j)) 

3=1 



<^|p(C/=i|^ t] )-J(C7=i) 

3=1 



and 



7rf°|| = P(Xf = o*!^, CO ~ P(*f = ai\&l tY Xi) 



i=l 



Y,J2 I ( U = 3) \P{X? = a^^U = j) P(xf = Oi\&l tV U = j, X?) 



i=i j=i 



j^i(u=j)\y t -7T 

3=1 



J^M 
t ) 



where ft denotes the conditional distribution of (3 restricted to the subalphabet §j. 
By Lemma lb. 31 



3=1 







1 _ I 



— > by applying Theorem 14. f I to each Sj 



while E£=iJ(tf = j) K 

Appendix A. Proof of Proposition 13.21 

Proof. (Sketch) We use the following construction for X. Let Xq be a random 
variable with values in S = {1, 2, 3, 4} and P(Xo = j) = i/j, j = 1, . . . , 4. Introduce 
independent of Xq matrix-valued process 



(A.I) 



f-N 12 (t) N l2 {t) \ 

-N 2S (t) N 23 (t) 

-N u (t) N 34 {t) 

V ^4i(t) -N 41 (t)J 



where Nij(t) are independent copies of Poisson process with the unit rate. Let us 
consider the Ito equation 



(A.2) 



h = h 



oW*I s 
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with I the vector with entries IqU) = -^(-^o = j)i 3 ~ 1, ■ ■ • ,4. Since the jumps 
of Poisson processes Nij(t)'s are disjoint, for any t > the vector I t has only one 
nonzero entry. Moreover, whereas the increments of Aft are independent for nonover- 
lapping intervals, It is Markov process. It is readily checked that, with the row vector 
g = (l 2 3 4J , Xt = git is Markov process with values in § and the transition 
intensities matrix A and It (J) — I(Xt = j), j = 1, . . . ,4. 

We will follow Theorem 4.10.1 from [31]. The random process Y has pieccwisc 
constant paths with jumps of two magnitudes, +1 and —1. Due to l|A.2|) . its saltus 
measure p(dt, dy) is completely described by 

p{dt,{\}) = {l t -(4)dN 41 (t) + I t -(2)dN 23 (t)} 
P (dt,{-1}) = {l t _(l)dN 12 (t) + I t -(3)dN 3i (t)}. 

So, the compensator q(dt,dy) of p(dt, dy) with respect to the filtration ($fo,t])t>o is 
defined as 

?(dt, {1}) = fa- (4) + n t -(2))dt = (1 - Yt„)dt 
q(dt,{-l}) = (7rt_(l)+ir t -(3))dt = y t _dt. 

Notice also that 

(A.4) p((ft,{i}) = (i-y t _)dy t and P (dt, {-l}) = -y t _dY t . 

Equation l|A.2|) also gives "drift+martingale" presentation for Ii(t), I 2 {t): 



dl t (l) = (-7 t (l)+/ t (4))dt + dMi(t) 
dT t (2) = (J t (l) - J t (2))dt + dM a (t) 



(A.5) 

with martingales 

Mi(i) = y ( - 4-(l)d(iV 12 ( S ) - s) + J a _(4)d(JV4i(«) - S )) 
M 2 (t)= J (l s -(l)d(N 12 -s) - I s -(2)d{N 23 ( s ) - a)). 

Then, by Theorem 4.10.1 in adapted to the case considered, we have 

d7ri(*) = (-7Tt(l) + 7r t (4))dt+ [ H 1 (uj,t,y)[p(dt,dy)-q(dt,dy)] 



(A.6) 

d7T 2 (t)= (7r t (l)-7r t (2))dt+ / H 2 {u,t,y)[p{dt,dy) -q(dt,dy)], 



where Hi(ui, t,y), i — 1, 2, are ^(Y) <g> ^(R) -measurable functions (here &(M) is the 
Borcl cr-algcbra on R and &(Y) is the predictable cr-algcbra on f2 x R + with respect 
to the filtration ($fo,t] )*>())■ Moreover 

fli (w, t, y) = M (AMj + Z_ (i) 1 <g> #(R)) (w, i, y) - 7T t _ (<), 

where AMj and /-(«) are the processes Mi{t) — Mi(t-) and I t -(i), respectively, 
and M( ■ \3?(Y) % £%(M.)) is the conditional expectation with respect to the measure 
M(duj,dt,dy) = P(dw)p(dt,dy) given &(Y) <g> £S(R). 



STABILITY OF THE WONHAM FILTER 



25 



By (JA.5fl . AMj(i) + It-(i) = It{i) and the structure of compensator q provides 
(here AI t (i) = I t (i) - J t _(i)) 

M(l(i)\&>(Y) <g> ^(R)) - 7r t _(i) = M(AI(i)|^(F) ® ^(M)). 

The desired conditional expectation is determined uniquely from the following iden- 
tity: for any bounded, compactly supported in t and &{y) (£> J?(R)-measurable func- 
tion (p(u>, t, y) 



E J o J <f>(.u,t,y)AI t {i)p(dt,dy) 

= E J o ° C J t, y)M(AI(i)\0»(Y) ® #(R)) ( w> i, j/)g(df, dy). 

By (TOll 

AJ t (l) = -J t _(l)AJVi 2 (t) + Jt_(4)AJV 4 i(t), 
A/ t (2) - / t _(l)AJVi a (t) - / t _(2)AJV 23 (t), 

and so 

AI t (l)p(dt,{l}) = I t -(4)dNu(t), 
Al t (l)p(dt,{-1}) = -I t -(X)dN 12 (t), 
AI t (2)p(dt,{l}) = -I t .(2)dN 23 (t), 
Al t (2)p(dt,{-1}) = I t -(l)dN 12 (t). 

Owing to the obvious relations 

I 4 (t) = J 4 (t)(l - Y t ), I 2 {t) = 7 2 (t)(l - Y t ), 

h(t) = h(t)Y t , h{t) = h{t)Y t 

we have 

7rt_(2)di = 7r t _(2)(l-Y t _)dfc, 7r t _(2)dt = 7r t _(2)(l -y t _)dt 

7T t _(l)dt = 7T t _(l)y t _A, 7T t _(3)<ft = 7r t -(3)y t _dfc. 



Taking into account (|A.3|) . we find 

H 2 (u,t,y) = 



7r t _(4), y = l, 

-7T t _(2), 2/=l, 
7T t _(l), 2/=-l. 



In accordance with (|A.3fl . I|A.4|) . the formulae for Hi, H 2 , and l|A.7|l . we transform 
fO)l to 

d-Ki{t) = ( - 7T t (l) + 7T t (4))di + 7T t _(4)(l - Y t -)(dY t - dt) + 7T t _(l)Y t _(dy + dt) 

= 7r t _(4)(i - Y t -)dY t + 7r t _(i)y_dy t 
= (1 - 7r t _(2))(i - y-)dy + 7r t _(i)y_dy, 

d7r 2 (t) = (7r t (l) - 7T t (2))dt - 7T t _(2)(l - Y t -)(dY - dt) - 7rt_(l)y t _(dy t + dt) 
= -7r t _(2)(l - Y t _)dY t - 7r t -(l)y t -dy. 
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